Tsirelson's bound from a Generalised Data Processing Inequality 



(N 

o 

(N 
m 



Oscar C. 0. Dahlsten, 1, 2 ' 3 Daniel Lercher, 1,4 and Renato Renner 1 

1 Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland 
2 Center for Quantum Technologies, National University of Singapore, Republic of Singapore 
3 Atomic and Laser Physics, Clarendon Laboratory, 
University of Oxford, Parks Road, Oxford OX13PU, United Kingdom 
^Department of Mathematics, Technische Universitat Miinchen, 85748 Garching, Germany 

(Dated: July 4, 2012) 

The strength of quantum correlations is bounded from above by Tsirelson's bound. We establish 
a connection between this bound and the fact that correlations between two systems cannot increase 
under local operations, a property known as the data processing inequality. More specifically, we 
consider arbitrary convex probabilistic theories. These can be equipped with an entropy measure 
that naturally generalizes the von Neumann entropy, as shown recently in [Short and Wehner, Bar- 
num et. al.]. We prove that if the data processing inequality holds with respect to this generalized 
entropy measure then the underlying theory necessarily respects Tsirelson's bound. We moreover 
generalise this statement to any entropy measure satisfying certain minimal requirements. A con- 
sequence of our result is that not all of the entropic relations used to derive Tsirelson's bound via 
information causality in [Pawlowski et. al.] are necessary. 

PACS numbers: 03.65.Ta, 03.65.Ud 
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Introduction. — Quantum mechanics departs funda- 
mentally from any classical theory by allowing non-local 
correlations The existence of such correlations has 
been extensively verified in experiments (up to a few 
loopholes), see e.g. @. As was shown by Bell, these 
correlations imply that the world is not both local and 
realist, two standard assumptions underpinning the clas- 
sical mechanical world- view Apart from their fun- 
damental theoretical interest, non-local correlations are 
also of technological importance, for example as the es- 
sential ingredient in Ekert-style quantum cryptographic 
schemes @. 

However there is a limit to how much local realism is 
violated. The strength of quantum correlations are them- 
selves upper bounded by Tsirelson's bound [4, 5]. This 
is a non-trivial bound, because it is conceivable to vio- 
late Bell-inequalities more than quantum theory, without 
having a theory which is signalling (allows instantaneous 
information transfer across space) . For example it is pos- 
sible to conceive of PR-boxes, also known as non-local 
boxes, hypothetical systems which maximally violate the 
CHSH Bell-inquality, without being signalling Q. 

The question then arises as to whether one can asso- 
ciate a fundamental assumption about nature other than 
non-signalling with Tsirelson's bound. Such an assump- 
tion could then be labelled a fundamental principle un- 
derpinning quantum theory, and possibly form part of 
a much-sought-for set of principles from which quantum 
theory could be derived. 

There has already been significant effort in this di- 
rection. For example, it is now known that the exis- 
tence of maximally Bell- violating correlations would lead 
to some communication complexity problems becoming 
trivial @, Hj], the possibility of oblivious transfer 
weaker uncertainty relations (Toj . general invalidation of 
quantum theory locally [ill ] and severely limited dynam- 



FIG. 1: The Data Processing Inequality states that the cor- 
relations between A and B cannot increase under a local op- 
eration T on B. More specifically H(A\B) < H(A\T(B)). 



ics [H [3. 

A recent string of related papers have moreover been 
concerned with a principle called information causal- 
ity [F4H16j . A great advantage with this principle is that 
the exact Tsirelson's bound is recovered, i.e. it rules out 
any stronger correlations, not just the maximally strong 
ones. The principle amounts to placing a limit on how 
well two separated parties can perform in a particular 
game (van Dam's game |9J) where they share a resource 
state. This limits the resource state in such a way that 
Tsirelson's bound is recovered. Whilst the original inter- 
pretation of information causality as a particularly sim- 
ple generalisation of non-signalling has been questioned 
(see e.g. [13), the principle is — as mentioned above — 
powerful. 

Intriguingly, in the proof that information causal- 
ity holds in quantum theory, a specific limited set of 
information-theoretic theorems are used. One may thus 
replace information causality as a postulate with those 
information-theoretic theorems. This is attractive if one 
seeks an information-theoretic set of principles for quan- 
tum theory. In order to discuss the validity of such theo- 
rems outside of quantum theory, however, one needs def- 
initions of the relevant entropies for general probabilistic 
theories. Fortuitously, such definitions were recently pro- 
posed and investigated in [l7l - [l9| . In pj], [l8| information 
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causality is also discussed. In [18| three sufficient condi- 
tions under which a generalised probabilistic theory re- 
spects information causality are determined. In [TtJ it is 
shown that if one follows the information causality proof 
in the case of box-world, the theory with PR-boxes and 
all other non-signalling distributions, the proof breaks 
down at the point where one needs to assume the so- 
called strong subadditivity of entropy. An alternative 
approach to deriving information causality from more 
basic entropic principles appears in [20]. These recent 
works when taken together suggest that one may hope for 
a small and operationally motivated set of information- 
theoretic relations from which Tsirelson's bound, and 
perhaps even quantum theory, can be derived. 

We here investigate the Data Processing Inequality as 
such a principle. This essentially states that correlations, 
quantified via conditional entropies, cannot increase un- 
der local operations, see Fig. [TJ In order to define this 
in general, we use an entropy proposed in [l7| . which 
naturally generalizes the von Neumann entropy (and re- 
duces to the latter in the case of quantum theory). We 
prove that, surprisingly, this generalised Data Processing 
Inequality alone implies Tsirelson's bound. 

We proceed as follows. We firstly describe the frame- 
work of generalised probabilistic theories within which 
we work. Then we define Tsirelson's bound as well as 
information causality. We go on to describe how to de- 
fine entropy in an operational manner as in [l7|. This is 
used to define the generalised Data Processing Inequal- 
ity (DPI). We then prove that DPI implies Tsirelson's 
bound. This involves proving a more general theorem of 
which the main result is a corollary. Finally we compare 
the results to previous ones and discuss the implications 
and interpretation of the principle. 

Convex, operational, probabilistic theories. — 
We use the framework of convex probabilistic theo- 
ries [12], [21], [22j ■ This amounts to taking the minimalistic 
pragmatic view that the operational content of a theory 
is in the predicted statistics of measurement outcomes. 

The state of a system by definition determines the 
probabilities of all possible measurement outcomes. The 
state is completely specified, again by definition, by the 
probabilities for the outcomes of k so-called fiducial mea- 
surements 0, . . . , k — 1. k may be significantly smaller 
than the total number of measurements (e.g. in quan- 
tum theory there is a continuum of measurements but 
k = d 2 for a state on a Hilbert space of dimension d). 
If these fiducial measurements each have I possible out- 
comes 0, ...,/— 1 we will say that the system is of type 
(k,l). 

We can thus write a (normalised) state as a list of 
P(i\j), denoting the probability of getting outcome i if 
fiducial measurement j is performed. We represent this 
by P. The normalisation of the state is \P\ :— P(i\j) 
and is for all valid states independent of the choice of 
fiducial measurement j. A state is said to be normalised 
if \P\ = 1 and subnormalised if \P\ < 1. 

We assume that the set of allowed normalized states S 



is closed and convex (so that any probabilistic mixture 
of states is an allowed state) . We say a state is pure if it 
cannot be written as a convex mixture of other states. A 
theory is defined by the set of allowed states, S, as well 
as the set of allowed transformations. 

Transformations take states to states. They must be 
linear as probabilistic mixtures of different states must be 
conserved fl2| . Transformations can thus be modelled as 
P h-> M ■ P, where M is a matrix. If one performs a 
measurement with several outcomes each outcome is as- 
sociated with a certain transform Mi. The unnormalized 
state associated with the i-th outcome is Mi ■ P, and the 
associated probability of the i-th outcome is given by the 
normalisation factor after the transformation: |Mj ■ P\. 

If one is only interested in the probabilities of the dif- 
ferent outcomes of a measurement one can always asso- 
ciate with a transformation {Mi} a set of vectors {Ri} 
such that Ri - P = \Mi • P| VP G S. Consequently, for 
a normalized state P, Ri - P is the probability of the ith 
outcome. 

It is also possible to combine single systems to form 
multipartite systems. If one performs local operations on 
the systems A and B the final unnormalized state of the 
joint system does by assumption not depend on the tem- 
poral ordering of the operations. A direct consequence 
of this is the no-signaling principle: measuring system B 
cannot give information about what transformation was 
applied on A 

We will make the non-trivial but standard assump- 
tion that the global state of a bipartite system can be 
completely determined by specifying joint probabilities 
of outcomes for fiducial measurements performed simul- 
taneously on each subsystem. Accordingly, the joint state 
of two parties is uniquely specified by the list P(ii'\jj'), 
denoting the probability of getting the outcomes i and i' 
if one performs fiducial measurement j on A and j' on 
B. 

For a joint state Pab, the marginal (also called re- 
duced) state of system A, denoted Pa, is given by 
Pj^{i\j) = J2i> PAB(H'\jj')- Similarly, the conditional 
marginal state Pa\b-.u l is defined by 



P. 



A\B:k,l 



_ P AB {ik\jl) 



(1) 



P B {k\l) ' 

This represents the state of system A after a fiducial mea- 
surement I was performed on system B and the outcome 
k was obtained. 

It was shown in [12j that denoting the vector spaces 
containing the vectors Pab, Pa, and Pb by Vab, Va 
and Vb , respectively, one can relate the spaces by Vab — 
Va®Vb (® being the tensor product). One assumes that 
for Pa € Sa and Pr € Sb we have Pa®Pb & Sab ■ This 
implies that any Pab £ Sab can be written as Pab — 
J2i t iP 1 a ® Pb w ^h P\ € Sa and P B e Sb normalized 
and pure and n € E, [13 |. 

For a transformation on system A defined by Pa i— > 
P A , = Ma ■ Pa the transformation of the joint system 
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is given by P AB h> P a >b = (Ma <g> 1) • Pas 53- We 
demand that transformations JW4 on any system A are 
well-defined, meaning (Ma <S> Ib) • -Pas € whenever 
Pab € Sab for all types of system B. 

In the following, we will always assume that the set of 
transformations allowed by the theory includes removing 
systems (which corresponds to taking the marginal state, 
as defined above) and adding a system, taking Pa H> 
Pa®Pb- 

We also demand that the theory contains 'classical' 
systems of type for all d e IN. We call the trivial 

classical system of type (1, 1) the vacuum (V). We shall 
in our proofs, taking inspiration from (2lj . use the fact 
that the state of a classical system can be cloned — see 
the technical supplement. 

As shown e.g. in [22"j . finite dimensional quantum the- 
ory as well as classical probability theory fit into this 
framework. So does box-world [l2j]. This allows all states 
on discrete sets of measurements that are non-signalling. 
The simplest non-trivial example of this is for elemen- 
tary systems of type (2,2). The joint state space of two 
such systems includes PR-boxes. A key difference be- 
tween box-world and quantum theory is that only the 
latter respects Tsirelson's bound. 

Tsirelson's bound. — The quantum correlation 
strength as quantified by the CHSH Bell inequality [23[ 
is upper bounded by Tsirelson's bound [HQ. 

Definition 1 (Tsirelson's bound). Consider two systems 
A and B, with two choices of measurements (0 or 1) and 
two outputs each (a and b). Define the quantity 

S := p{a = 6|00)+p(a = 6|01)+p(a = 6|10)+p(a ^ b\ll). 

The theory governing the systems is said to satisfy 
Tsirelson's bound if 2 - y/2 < S < 2 + V2 for any states 
allowed by the theory. 

A PR-box (also known as a non-local box) is designed 
to have S=0 or 4, thus maximally violating the Tsirelson 
bound @. It is defined (up to relabellings of measure- 
ment choices and outcomes) to be a state where 

p(a = b\00) = p(a = fojOl) = p(a = b\W) = p(a b\U) = 1 

and the local marginal states are uniformly random. 

Information causality. — Let there be two space-like 
separated parties, Alice and Bob which share an arbi- 
trary no-signaling resource. Alice then receives a ran- 
dom bit-string a = (ao, . . . , ajv-i), which is not known 
to Bob. The bits ai are unbiased and independently dis- 
tributed. At the same time Bob gets a random variable 
b £ {0, . . . , N — 1}, which is unknown to Alice. Alice is 
free to make use of her local resources in order to prepare 
a classical bit-string x of length m which she sends to 
Bob. Bob, having received Alice's message, is then asked 
to guess the value of Of, as best as he can. Let us denote 
Bob's guess by /3. The efficiency of Alice's and Bob's 
strategy can be quantified by I = J2i-^Sh(ai ■ P\b = i) 



where Ish{ a i '■ P\b = i) is the Shannon mutual informa- 
tion between ai and j3, computed under the condition 
that Bob has received b = i. 

Definition 2 (Information Causality). A theory is said 
to respect information causality if in the above game I < 
m for any allowed resource state. 

It was shown in [l4| that information causality implies 
Tsirelson's bound. 

General entropy definition. — We now recount cer- 
tain results from recent research into how to quantify 
entropy in general probabilistic theories |17Hl9f . We 
shall in particular use a definition of entropy for gen- 
eral theories from [TtJ which is based on the Shannon 
entropy This is highly analogous to how the von Neu- 
mann entropy generalises the Shannon entropy Hsh(P) = 
— J2i Pi l°g -F» to the quantum case. The intuition is that 
the von Neumann entropy is the minimal Shannon en- 
tropy over all measurements. Actually it is over all fine- 
grained measurements (explained below) . 

Note that one can in general define the Shannon en- 
tropy associated with a measurement e as ffsh(e(P)) = 

-£i(£|-P)iog(£f -P) 

Definition 3 (Entropy }17l|). For every normalized state 
Fe5 the entropy H(P) is given by 

H(P) ee inf H sh (e(Pj) . (2) 

e(P) denotes the classical probability distribution for the 
different outcomes of e and the minimization is over the 
set of all fine-grained measurements M* . 

Ai* above is defined to be the set of measurements 
which have no non-trivial fine-grainings. A fine-graining 
is a subdivision of one outcome into several different out- 
comes. A trivial fine-graining is one where the resulting 
outcomes do not have independent probabilities, or more 
formally, where the vectors representing the respective ef- 
fects are proportional to the effect- vector associated with 
the original coarse-grained outcome. 

The restriction to minimizing over M* is important. 
If one allowed coarse-grained measurements the entropy 
could always be reduced arbitrarily by grouping out- 
comes together into single outcomes. It is natural to 
draw the line at trivial fine-grainings since no more in- 
formation is yielded by them. 

The entropy H(P) can be interpreted as the mini- 
mal uncertainty that is associated with the outcome of 
a maximally informative measurement. It has some ap- 
pealing properties: (i) H reduces to the Shannon en- 
tropy for classical probability theory and the von Neu- 
mann entropy in quantum theory, (ii) Suppose that the 
minimal number of outcomes for a fine-grained mea- 
surement in A4* is d. Then for all states P € S, 
log(d) > H(P) > and (iii) for any P u P 2 € S 
and any mixed state P m i x = pP\ + (1 — p)P% € S: 
H (P mix ) > pH^) + (1 - p)H{P 2 ) [HI. 
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For a state Pab of a bipartite system AB one defines 
the conditional entropy of A conditioned on B by [l?} 

H(A\B) Pab ee H(P AB ) - H(P B ) , (3) 

with Pb the reduced state of Pab- If there are no am- 
biguities we drop the indices and we write H(A) instead 
of H{Pa) and H{AB) instead of H(Pab), an d so on. 

Some properties that are satisfied in quantum theory 
(where this entropy reduces to the von Neumann en- 
tropy) are not necessarily satisfied for arbitrary theories. 
In box-world, for example so-called strong subadditivity 
can be violated, as well as the subadditivity of the con- 
ditional entropy [l7| . 

Data processing inequality. — The data processing 
inequality (DPI) is a crucial property of entropy mea- 
sures which is frequently used in proofs in classical as 
well as quantum information theory [U [25| . DPI quan- 
tifies the notion that local operations cannot increase 
correlations. A standard formulation for the classical 
case is that H(X\Y) < H(X\g(Y)), where X and Y are 
random variables which may be correlated, H(X\Y) :— 
H{XY) - H(Y), and g(Y) is a function of Y only. The 
quantum DPI is the same, but with H denoting the von 
Neumann entropy. 

We will here use the following generalised definition of 
DPI due to Short and Wehner [17| . 

Definition 4 (Data Processing Inequality (DPI)). Con- 
sider two systems A and B. The data processing inequal- 
ity is that for any allowed state Pab € Sab o-nd for any 
allowed local transformation T : Pb — > P' B 

H(A\B) PAB <H(A\B') {mT)PAB , (4) 

where H(-\-) denotes the conditional entropy of Eqn. ([3]). 

Main result. — Our main result links the data pro- 
cessing inequality with Tsirelson's bound. 

Theorem 1. In any general probabilistic theory where 
the Data Processing Inequality is respected, the Tsirelson 
bound is respected. 

Proof. We here sketch the proof — see the appendix for 
the details. 

We use the fact that the entropy of Def. 3 satisfies two 
properties: (i) H{A\B) := H{AB) - H(B) (we call this 
COND), and (ii) it reduces to the Shannon entropy for 
classical systems (we call this SHAN). 

We prove that for any theory and entropy measure 
H jointly satisfying COND, SHAN and DPI, Tsirelson's 
bound holds (where DPI has been defined using H). This 
implies the main theorem. 

The three conditions are not trivially applicable to re- 
strict the resource state in van Dam's game so we use 
them, within the framework of probabilistic theories, to 
derive certain more directly applicable lemmas, includ- 
ing: (i) ^2nH{AiYf) > H(A\ r y), where Ai denotes the 



i-th party of a multi-party system A (ii) H(A) > H{A\B) 
with equality for product states, and (iii) for classical sys- 
tems X, H(X\Y) > 0. With these lemmas and some 
additional arguments we show information causality is 
respected, and thus, by (HI, Tsirelson's bound. 

□ 

Discussion. — We have shown that the generalised 
DPI implies Tsirelson's bound. This addresses a question 
raised in [l7^ . namely in what manner enforcing gener- 
alised entropic relations restricts the set of possible theo- 
ries. It also contributes to our understanding of why Bell- 
violations in quantum theory respect Tsirelson's bound. 

As indicated in the proof sketch, our quantitative re- 
sults can be applied to more general entropy measures. 
In particular, for any entropy measure H and theory 
jointly satisfying COND, SHAN and DPI, we show that 
Tsirelson's bound holds. Thus one could alternatively 
have used for example the decomposition entropy of [171 ] 
in the statement of the main theorem as it satisfies SHAN 
and is defined to satisfy COND [17]. At the same time 
one may argue that whilst an operationally appealing def- 
inition of conditional entropy should automatically sat- 
isfy SHAN and DPI it is not clear why it should in gen- 
eral satisfy COND. COND may then be viewed as a re- 
striction on states rather than a definition of conditional 
entropy. 

One can compare our three sufficient conditions 
COND, SHAN and DPI to those used in [ll and [3 
respectively. The entropic relations used in [lj] to derive 
information causality were formulated in terms of a con- 
ditional mutual information I (A : B\C). (It is assumed 
this can be defined in a more general setting, but no def- 
inition is given.) The conditions are that I(A : B\C) 
should: be symmetric under change of A and B, be non- 
negative (I > 0) , reduce to the Shannon mutual informa- 
tion for classical systems, obey the Data Processing In- 
equality as formulated for mutual information, and obey 
the chain rule I(A : B\C) = I (A : BC) - I(A : C). Ar- 
guably our three relations are more minimalistic and nat- 
ural than those. Moreover we show the arguments apply 
to particular concrete definitions of entropy and that for 
at least two particular definitions of conditional entropy 
DPI alone suffices. Consider secondly [llj]. There con- 
crete entropy definitions are proposed and studied. The 
definitions are very similar to |17j ] though the framework 
is not a priori exactly identical. They define three prop- 
erties in terms of conditional entropy as H(AB) — H(B), 
with H the measurement entropy: (i) 'monoentropicity' 
(two particular different entropy measures always have 
the same value), (ii) a version of the Holevo bound, and 
(iii) 'strong sub-additivity' (defined below). They show 
that those conditions imply information causality. They 
moreover note that conditions (ii) and (iii) can be derived 
from DPI defined in terms of the above conditional (mea- 
surement) entropy (more correctly they define it using 
mutual information I(A : B) := H(A) + H(B) - H(AB) 
but this is equivalent in this case). Assumption (i) is 
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used to obtain what we here derive as Eq. IA101 Thus it 
appears one may alternatively summarise their result on 
information causality as follows: DPI (in terms of COND 
and measurement entropy) plus mono-entropicity implies 
information causality. This can be compared to our The- 
orem 1; it is not so clear how to compare it to our more 
general Theorem 2, as the latter does not refer to a spe- 
cific entropy measure, but to any state space and con- 
ditional entropy measure jointly satisfying DPI, COND 
and SHAN. 

DPI is related to a condition known as strong subad- 
ditivity (SSA) which states that H(A\CD) < H{A\C). 
SSA is implied by DPI since forgetting D is an allowed 
local operation. In the quantum case SSA also implies 
DPI, but this does not necessarily hold in other theo- 
ries as the standard quantum proof relies on the specific 
quantum feature known as Stinespring dilation. In the 
extreme case of box- world it was already known that SSA 
(and thus also DPI) is violated |l7( . As an example con- 
sider two classical bits x°, x 1 and a gbit Z. The latter 
is a (2,2) system which can take any allowed distribu- 



tions, i.e. its state space is the convex hull of four states 
wherein the two outcomes take defined values for each 
measurement. The classical bits are uniformly random 
but the gbit contains their values. Then H(x°\x 1 Z) = 1 
whereas H(x°\Z) = 0, violating SSA [l7|. 

It is an open question whether there are theories which 
satisfy DPI but have states not contained in quantum 
theory, since Tsirelson's 2 + V2 bound is insufficient to 
rule out all non-quantum states. Understanding this and 
with what DPI needs to be supplemented in order to 
derive quantum theory fully are natural next steps. 
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Additional Note. — Similar results have been obtained 
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Appendix A: Proof of main theorem 

The main theorem is a direct corollary of a more gen- 
eral theorem, Theorem 2, which we state and prove in 
this section. Crucially, Theorem 2 does not refer to a 
specific entropy measure such as the measurement en- 
tropy defined above. 

We require three definitions to state this theorem. 

Firstly we redefine DPI, now defined without reference 
to a specific entropy definition. 

Definition5 (Data Processing Inequality (DPI)). Con- 
sider two systems A and B. The data processing inequal- 
ity is that for any allowed state Pab G Sab and for any 
allowed local transformation T : Pb — > P'g 

H(A\B) PAB <H(A\B') (mT)PAB . (Af) 

Definition 6 (Conditional entropy (COND)). The con- 
ditional entropy H(A\B), however it is defined, must for 
all allowed states on AB satisfy 

H(A\B) = H{AB) - H(B). (A2) 

Definition 7 (Reduction to Shannon entropy (SHAN)). 
The entropy H must reduce to the Shannon entropy for 
classical systems. 

Our statements are restricted to the generalised prob- 
abilistic framework, as described in the introduction to 
the paper. We shall be making use of two non-trivial but 
operationally well-motivated types of transformations as- 
sociated with that framework: adding and removing sys- 
tems. An (independent) system in state Pb is added by 
the map taking any Pa to Pa <8> Pb- A system is re- 
moved by taking the marginal distribution on the other 
system(s), as described in the introduction. We shall 
make use of the fact that this map acts to take the re- 
moved system B to the vacuum system V. The only 
normalised state of the vacuum is ly = 1 (this can be 
seen from the equivalent definition of the marginal state 
used e.g. in [Hj]). Thus, and this is another equation we 
shall find useful, P A ® 1 v = Pa VPa- 

We shall also be assuming that the entropy measure is 
operational, i.e. is uniquely determined by the statistics 
of the experiment under consideration. Thus it is for a 
given set-up determined by the state of the systems under 
consideration. More subtly, H moreover cannot depend 
on the order in which the state-spaces of the subsystems 
are composed, as this order is arbitrary; different ob- 
servers describing the same experiment can make differ- 
ent choices here. Thus H{AB) must be invariant under 
the interchange of systems A and B. 

We are now ready to state the theorem: 

Theorem 2. For any probabilistic theory and en- 
tropy measure H satisfying COND, SHAN and DPI, 
Tsirelson's bound holds. 



Before proving Theorem 2 we note that the main the- 
orem (Theorem 1) is directly implied by this statement 
as the entropy H referred to there satisfies COND and 
SHAN. 

Before proving Theorem 2 we prove some lemmas 
which we shall need and which may be of interest in 
themselves. 

Lemma 3. COND and DPI imply the relation 

Y,H{A i \ 1 )>H{A l ...A n \ 1 ) (A3) 

i 

for any Pa^.-.a^ S $a 1 ...a 2 > where Ai denotes the i-th 
party of the total system A\...A n . 

Proof. Consider firstly n = 2. By COND we have 
H(A 1 \y)+H(A 2 \~f)-H(A 1 A 2 \'y) = -H(A 2 \A 1 j)+H(A 2 \y) 

By DPI this is greater than or equal to 0. 

To generalise the argument to n > 2, let A 2 be replaced 
by A 2 ...A n in the previous equation. Then by the same 
argument 

iT(Ai| 7 ) + H(A 2 ..A n \ 1 ) - H{A 1 A 2 ..A n \ 1 ) > 0. 

Now we can apply the previous argument to the term 
H(A 2 ..A n \ 1 ) to get 

H{A 2 \i) + H(A 3 ...A n \j) > H(A 2 ...A n \ 7 ). 

This process is then repeated iteratively to recover 
J2 i H(A i \ 7 )>H(A 1 ...A n \ 1 ). □ 

Lemma 4. For product states P a <E)Pb, COND, SHAN 
and DPI imply the relation 

H(A\B) = H(A). (A4) 

Proof. We firstly use COND and SHAN to show that 
H(A\V) — H(A) for any system A. This follows from 
the following: 

H(A\V) = H(AV)-H(V) (A5) 
= H(A)-0 (A6) 
= H(A). (A7) 

(Here COND implies the first line. As V is classical 
and with only one measurement outcome, SHAN implies 
H(V) — 0; Pav — Pa as mentioned in the beginning of 
the appendix.) 

We now prove the equality of the lemma by separately 
proving the two corresponding inequalities in both direc- 
tions. Note firstly that 

H(A) > H{A\B) (A8) 

for any state. To see this, consider the transformation T 
that takes B to the vacuum system (i.e. the transforma- 
tion that removes B as described in the introduction to 
the appendix). Then, using DPI, 

H{A\B) < H(A\T(B)) = H(A\V) = H(A). 
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Consider secondly the inequality in the other direction, 
restricting ourselves to the case of product states only: 



H(A) p <H(A\B) 

p 



Pa®Pb 



(A9) 



H(A\V)p 



Pa®Iv 



< 



This is true because H(A) 

H(A\B) p A ^p B i where the last step uses DPI for the 

transformation that creates P B from the vacuum state 
(i.e. the transformation that adds B as described in the 
introduction to the appendix). 

Combining Eqns. IA8I and I A9I proves the claim. □ 



Lemma 5. DPI, SHAN and COND imply that for all 
classical systems X, 



H(X\Y) > 0. 



(A10) 



Proof. To prove the lemma via DPI we shall use the 
fact that the extremal states of classical systems can be 
cloned 21]. More specifically, we shall make use of the 
fact that for a classical system Xa in state Pa = ^2iPifii, 
where the pi are pure, and another classical system Xb of 
the same dimensionality in any given independent pure 
state [Ik, there exists a map Tc such that Tc{Pa®Pb) = 

Y^iPifli ® Mi- 

We shall consider a three-party system YXaXb, where 
Y is the only non-classical sub-system. The idea is that 
given an arbitrary state on YX := YXa, we can always 
bring in another independent subsystem Xb and perform 
a cloning operation so that Xb becomes a copy of Xa- 
We may then apply DPI on the cloning transformation 
Tc applied on Xa and Xb- We call the states before and 
after the cloning Pyx A x B an d -Pyx A x B respectively. 

By DPI we then have 



H{Y\X A X B )p< <H(Y\X A X B )p f (All) 

r «lJ« ^YX A X E 



Note now that the left-hand-side can be simplified. 
COND together with Eq. (|A4]) imply that H(AB) = 
H(A) + H(B) for independently prepared A and B. This 
can be applied here because Xb is initially in an inde- 
pendent state, yielding 

H(Y\X A X B )p, = H{Y\X A )p, 

r YX A X B r YX A X B 



Accordingly 



H(Y\X A ) Pl 



< H(Y\X A X B ) P f 



We also note that the marginal state on YXa is un- 
changed by the cloning, i.e. P Y x A ~ ^yx a i so we ma y 
for simplicity write that for the state after the cloning, 



H(Y\X A ) < H(Y\X A X B ). 



(A12) 



In the following, unless stated otherwise, we consider the 
state after the cloning only. 

Applying Eq. flM}, i.e. COND, to Eq. (|AT2l) and un- 
dertaking some rearrangements yields 

H(Xb\YXa)>H(X b \Xa). 



Moreover, SHAN implies that H(X B \X A ) = 0. Thus 
H(X B \YX A ) > 0. 

Note that since Xa and Xb are operationally 
indistinguishable after the cloning, H(Xa\YXb) = 
H{X B \YX A ). Thus we have 

H(X A \YX B ) > 0. (A13) 

By DPI 

H(X A \Y) > H(X A \YX B ). (A14) 

Thus, still after the cloning, we have that 

H(X A \Y)>0. (A15) 

But since the state of XaY is unchanged by the cloning 
transformation, this implies that the equation holds also 
for the (arbitrary) initial state of XaY. Recall that we 
used Xa to label the classical system X. We have thus 
shown that H(X\Y) > for an arbitrary initial state on 
XY. □ 

Lemma 6. COND, SHAN and DPI imply the relation 

H(a\Bx) >n-m, (A16) 

where the quantities are as defined in the information 
causality game (a is the classical n-bit string given to Al- 
ice, B is the non- classical resource and x is the classical 
m-bit message sent to Bob). 

Proof. 

H{a\Bx) - H{x\aB) = -H{Bx) + H(aB) 

= -H(Bx) + H{a) + H(B) 
= H(a) - H(x\B) 

> H{d)-H{x) 
= n — H(x) 

> n — m. 

The first line follows from COND. The second line is due 
to the combination of Eq. (|A4|) and Eq. (|A2|) and recall- 
ing that a and B are independent. The third line uses 
Eq. (|A"2|) again. The fourth line follows from Eq. (|A"8l) . 
The fifth and sixth lines follow from the definition of the 
game as well as elementary properties of the Shannon 
entropy, which can be exploited due to SHAN. 

It follows by applying Eq. (|A10[) to the left hand side 
that H{a\Bx) > n - m. □ 

We now put together the pieces to prove Theorem 2: 

Proof of Theorem 2. By lemma |6] above, we have 

H(a\Bx) > n — m. 

By lemma |3] this implies 

H(ai\Bx) > n — m. 
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By DPI we accordingly have that for Bob's guess /? = Recall that information causality implies Tsirelson's 
/3(B, x, i) bound. 

Y,H{ ai \l3[i))>n-m, n 

i 

where, by SHAN and the fact that a 4 and /3(z) are both 
classical, H refers to the Shannon entropy. 

This implies information causality, as Ish(oi : /?(«)) = 
H(ai)-H(ai\0(i)), so 

^Jshfo :£(*)) = ^HiaJ-IHaiMi)) 

i i 

= n-Y,H{ai\p{i)) 

i 

< m. 



